{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Writing Primer for Data Scientists.ipynb",
      "provenance": [],
      "include_colab_link": true
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/dair-ai/data_science_writing_primer/blob/master/Writing_Primer_for_Data_Scientists.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tD-GvSpBxCDj",
        "colab_type": "text"
      },
      "source": [
        "## [Headline]\n",
        "\n",
        "Typically, when you are writing something that's technical, you want a headline that's *appealing*. This will help you to reach a wider audience and let your work have more visibility and impact as well. The headline or topics of your tutorial should include a short title (typically 10~15 maximum words). The title can include the technology and technique you are talking about, combined with a nice action verb that appeals to a wider audience. For instance, I titled one of my [articles](https://medium.com/dair-ai/building-rnns-is-fun-with-pytorch-and-google-colab-3903ea9a3a79) as follows: \"Building RNNs is Fun with PyTorch and Google Colab\". You can get pretty creative with the headline, but keep in mind that this is the face of your article and it should be given plenty of tinkering before you finalize it. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "91oJq77cyui_",
        "colab_type": "text"
      },
      "source": [
        "## [Project Description]\n",
        "The project description marks the beginning of the tutorial you are writing. It should be clear, concise, and interesting. Here I suggest you to briefly explain what the following notebook tutorial does (usually one sentence)? Then you can explain what technologies you will be using in the tutorial (usually one sentence)? You can also briefly explain what value or knowledge the user will obtain after finishing the tutorial (a short list or two sentences will do)? In addition, you can give credit to any of the notable resources you are utilizing, and also briefly introduce yourself if the project description is not too lengthy? "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "q_e7rSozyvon",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "### Here you usually import your libraries"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qvp3U7N41RBT",
        "colab_type": "text"
      },
      "source": [
        "If there is an important clarification that you need to make about any of the libraries imported, now is the right time to do so."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Rylpdsb21Wo_",
        "colab_type": "text"
      },
      "source": [
        "## [Data Loading]\n",
        "The first step of the pipeline with any data science related tutorial is usually the data loading component. Besides visually describing the dataset in use to your audience, also try to briefly explain (in one or two sentences) where the data came from, i.e., the source of the data. Other specifications like dimensions and attribute type are important but can be neatly explained with examples using code and tools such as `pandas`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "g1_euxvq1XQZ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "### code for importing or downloading data"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ia6remU_fV7W",
        "colab_type": "text"
      },
      "source": [
        "## [Data Exploration] \n",
        "Since you are teaching through writing and not actually live coding, resist the temptation to write code that does anything with the data like transformation or feature engineering before actually exploring it. It's a common mistake or practice that should be minimized. You want to give the readers some idea about the data through basic statistics, plots, and figures. Practise this as much as you can, and it will become an important habit in your data science work flow. Your readers will also appreciate the courtesy."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MwVV-MGG14UL",
        "colab_type": "text"
      },
      "source": [
        "## [Data Preprocessing]\n",
        "Although sometimes not necessary, as some datasets already come preprocessed, I believe it is important to slightly mention what type of preprocessing steps the data has undergone -- even if you need to do this through code examples. It should clarify any confusion that can present itself during the modeling section of the tutorial. Remember, your audience wants to get a broad understanding of the data before the modeling component of the tutorial, so try to explain this part of the tutorial as clear as possible with examples. Take advantage of your notebook features and other tools such as `matplotlib` and `pandas`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "05S4Z52q156M",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "### code for preprocessing"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uZ6eXpGl2WBt",
        "colab_type": "text"
      },
      "source": [
        "## [Constructing Model]\n",
        "If you are using tools such as PyTorch or TensorFlow for your data science projects, this section is reserved for the computation graph. Here you usually just state very briefly what you are building. No need to go into details just yet!"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "l7tEeylE2kSR",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "### code for model"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GGGakX-l2o0s",
        "colab_type": "text"
      },
      "source": [
        "## [Testing Model]\n",
        "One of the things I have learned over the years is that everything in data science is better understood with examples, rather than just using plain code or pictures. Before you begin training your models make sure to explain to the reader what the model is expecting as input and what it is expected to output. Rendering code here with nice descriptions help to prepare the reader on what to expect during training the model, especially since the training code is usually longer than most sections of the tutorial. With libraries like [PyTorch](https://pytorch.org/) and [DyNet](http://dynet.io/) this is fairly easy since they are dynamic computing libraries. TensorFlow also offers an [eager](https://www.tensorflow.org/guide/eager) execution command, `tf.enable_eager_execution()` to evaluate operations immediately. This is what's called imperative programming and I am glad they have it. It makes it easy to teach others about the beautiful things these tools are able to accomplish. I like to think that data science is about storytelling and discovery, and it should remain that way. Clear writing helps!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NtF8ipN93JTj",
        "colab_type": "text"
      },
      "source": [
        "## [Training Model]\n",
        "When training the models you would specify what kind of optimization, hyperparameters, and data iterating methods you are using. To be honest, the training code is usually self-explanatory. If you did your job at the beginning, explaining your dataset and testing the model, this part of the tutorial is probably the one that needs less explanation. In my experience, most data computing libraries use similar training strategies, thus the training structure has become ubiquitous in some sense. If there is still any clarification in your training that you need the reader to know, you can always explain it beforehand. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "P2b6vtcU3ddJ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "### Hyperparameters\n",
        "\n",
        "### Training code"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "C3Hf7M1r3isG",
        "colab_type": "text"
      },
      "source": [
        "## [Evaluating Model]\n",
        "And lastly, it is  good practice to evaluate your models on some held out samples of the dataset. This helps the reader to get a gist of what the tutorial you just showed him/her contains. It also helps to re-emphasize on the values the tutorial is providing for the reader. This part of the tutorial also helps to finalize your final thoughts and share insights with your readers. Readers love insights. You can share plots, a lot of examples, and even explore the parameters of the model. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "GiRH0DTd3u4N",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "### Evaluation code"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "plZ38XRC3zJu",
        "colab_type": "text"
      },
      "source": [
        "## [Final Words]\n",
        "You are not writing a book, so it is not necessary to have a conclusion section. In my experience, you use the final section to summarize all your findings and the future ideas you are working on. This is also a great time to congratualte the reader for making it to the end of the tutorial -- that's a huge achievement. You show that you appreciate the readers. Then you can end the section with your favorite quote. \n",
        "\n",
        "And that's it! Congratulations for reaching the end of this primer. You are now more than equipped to deliver excellent tutorials to the whole data science community and to a wider audience. With this short primer, you should reach thousands, and hopefully millions, but most importantly, with it, you should be able to bring value to your readers and keep expanding the human knowledge base. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ng8pGkkw8XB3",
        "colab_type": "text"
      },
      "source": [
        "## [References]\n",
        "Remember to always give credit where it is due. It shows you are responsible and care for the long-term success of the community. Papers, other implementations, video, code repositories, etc., are some of the things are you looking to reference. If you don't want to include this very formal reference section, make sure to embed links throughout the tutorial as an alternative. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vvQvwkAx4u5r",
        "colab_type": "text"
      },
      "source": [
        "Written with ❤️ by [dair.ai](https://medium.com/dair-ai)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uvcxsIkL89AB",
        "colab_type": "text"
      },
      "source": [
        "## [Other Tips]\n",
        "- Try to ensure that your notebook-based tutorials have a very nice flow. If you are using a lot of functions, it will be nice if you can create seperate python files for them and import them here. You don't want your notebooks to be too detailed, but you also don't want it to be too flat.\n",
        "- Remember! You are teaching not dictating. Ask questions and immerse the reader, challenge them. There are various ways to do so.\n",
        "- Be sure to add comments in your code. These should be very short and concise intruction -- user a lot of action verbs, and avoid abstract nouns wherever possible. This tend help with those readers that prefer code rather than text. Another suggestion, is to specify the different dimensions of the data transformation steps you are applying in the different steps of the computation graph. \n",
        "- More coming soon!"
      ]
    }
  ]
}